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VH replacement (VHR) is a type of antibody gene rearrangement in which an upstream 
heavy chain variable gene segment (VH) invades a pre-existing rearrangement (VDJ). In this 
Hypothesis andTheory article, we begin by reviewing the mechanism of VHR, its develop- 
mental timing and its potential biological consequences. Then we explore the hypothesis 
that specific sequence motifs called footprints reflect VHR versus other processes. We 
provide a compilation of footprint sequences from different regions of the antibody heavy 
chain, and include data from the literature and from a high throughput sequencing experi- 
ment to evaluate the significance of footprint sequences. We conclude by discussing the 
difficulties of attributing footprints to VHR. 
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CONTEXT, DEFINITION, AND POTENTIAL MECHANISMS OF 
VH REPLACEMENT 

Antibodies are heterotetrameric proteins comprised of two heavy 
chains and two light chains that are formed through V(D) J recom- 
bination to generate a highly diverse repertoire of antigen binding 
receptors expressed by B cells. The recombinase activating gene 
encoded proteins, RAG1 and RAG2, target conserved heptamer 
and nonamers within recombination signal sequences (RSSs) to 
cleave the DNA that flanks recombining gene segments that join 
together to form the variable regions of antibody heavy and light 
chains [reviewed in Ref. (1)]. Typical V(D) J recombination gener- 
ates a signal joint and a coding joint, and the latter is further diver- 
sified at the junction between the recombining gene segments by 
mechanisms including P- addition, N- addition, and exonucleolytic 
nibbling [reviewed in Ref. (2)]. Occasionally atypical rearrange- 
ments occur, generating hybrid joints, open-and-shut joints, or 
joints between RSSs that ordinarily do not recombine (2-5). 

Antibodies can be further revised and diversified through 
receptor editing of the light chain, somatic hypermutation, gene 
conversion, and VH replacement (VHR). Receptor editing typi- 
cally involves RAG-dependent leapfrogging rearrangements on the 
same allele as the defective or autoreactive light chain, rearrange- 
ment on other alleles (k or X) and/or RS deletion [which renders 
preceding k rearrangement non-functional, reviewed in Ref. (6)]. 



Abbreviations: CDR3, third complementarity determining region; cRSS, cryptic 
recombination signal sequence; DH, antibody heavy chain diversity gene segment; 
footprint 5-mer, pentameric nucleotide sequence attributed to VHR; GVHD, graft 
versus host disease; IgH, antibody heavy chain; JH, antibody heavy chain joining gene 
segment; RSS, recombination signal sequence; SLE, systemic lupus erythematosus; 
VH, antibody heavy chain variable gene segment; VHR, VH replacement. 



Somatic hypermutation is DNA point hypermutation carried out 
by activation induced cytidine deaminase (AID) (7), and typically 
signifies a T-cell dependent antibody response. Gene conversion, 
in which homologous sequences from other V genes are grafted 
into the functional V gene, is a common method of gene diver- 
sification in chickens (8), rabbits and more recent examples have 
been described in horses and humans (9), and appear to be AID- 
dependent (10). The final category of antibody gene diversification 
is VHR, which is the focus of this article. Replacement involves the 
transfer (or invasion) of some or most of another V gene into an 
existing gene rearrangement. 

Darlow and Stott have reviewed the literature on VHR and 
envision two broad mechanistic classes of V replacement (11). 
The first, also termed "classical" VHR, consists of invasion of an 
existing VDJ rearrangement by an upstream VH. In classical VHR 
there is RAG-mediated cleavage at a cryptic RSS (cRSS) located 
in the 3 r end of the previously rearranged VH gene. The cRSS 
has a DNA sequence that differs from the conventional heptamer 
that flanks the DH gene segment by one nucleotide, bolded in 
the sequence that follows: S'-TACTGTG-S 7 (12) and is found in 
-70% of murine VHs and over 90% of human VHs (13). Occa- 
sionally other heptamers containing the 3 r GTG nucleotides can 
be used, suggesting that the last three nucleotides of the cRSS 
motif are critical (14, 15). The TGT within the cRSS is the codon 
encoding the conserved cysteine at the junction between FR3 and 
CDR3. The second class of replacement, according to Darlow 
and Stott, involves the transfer of other sequences of homol- 
ogy between different V genes at different sites, many of which 
appear to also resemble cRSSs. Examples of this second cate- 
gory of VHR have been described in antibodies cloned from 
single B cells in human tonsils (16), in antibodies cloned from 
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FIGURE 1 | (A)VH replacement: an upstream VH gene invades by 
rearranging into a pre-existing rearrangement. RAG cleaves the 
conventional recombination signal sequence (black triangle) of the invading 
VH (light blue VH gene) and cleaves at a cryptic heptamer sequence (cRSS, 
dashed white triangle) of the invaded VH gene (yellow VH gene). The 
resulting rearrangement is shown on the second line of the diagram and 
includes the DH and JH genes of the previous rearrangement and the new 
VH gene. Also included in the VH replacement product is a remnant or 
"footprint" of the preceding VH gene (denoted by a yellow box). Often the 
products of VH replacement exhibit CDR3 elongation, due to the retention 
of the footprint sequence (the CDR3 is indicated by the bar under the 
sequence and the added length of the new CDR3 sequence including the 
footprint (in red) is indicated by the black bar below the sequence.) 
(B) Serial VH replacement. The same conventions are used as in (A) and a 
longer CDR3 is generated, via the accumulation of footprint sequences. In 
both panels, boxes denote exons, lines introns, triangles RSS and the 
rearrangement is indicated by dashed black lines. This diagram is not drawn 
to scale. (C) Long CDR3 sequence with possible VH replacement(s). Shown 
is the nucleotide sequence of an expanded B cell clone that was recovered 
from peripheral blood DNA of a patient with systemic lupus erythematosus 
(SLE) that reveals a 91 nucleotide CDR3. Kowal et al. described an 
anti-dsDNA H chain sequence comprised of a VH3-N-DH2-2-JH6, which has 
similar features to this junction, although it was shorter (21 ). Sequences in 
black font match the corresponding germline gene segments. Red font 
denotes possible N-additions and yellow shading highlights potential 
footprint sequences. Dashes indicate regions where sequences do not 
overlap. FR3, framework region 3; CDR3, third complementarity 
determining sequence; FR4, framework region 4. 



synovial tissue of patients with rheumatoid arthritis (17), and in 
antibodies cloned from human mucosa associated lymphoid tissue 
lymphomas (18). Alternatively or in addition to RAG-mediated 
rearrangement, replacements in this second category may arise 
due to AID-mediated homologous recombination events that 
are unrelated to the putative cRSSs (11). However, the mech- 
anism of type 2 replacement is far from resolved as recently 
a non-AID-dependent form of replacement has been described 
at the k locus using human pre-B cell lines (19). As the mol- 
ecular mechanism of type 2 replacement remains to be fully 
elucidated, we will focus the remainder of our analysis in this 
manuscript on classical VHR (which we refer to hereafter as "VH 
replacement"). 

During VHR, an upstream VH gene invades into the cRSS, 
replacing all but the last few nucleotides of the previously 
rearranged VH gene (Figure 1A). The remaining 3 r nucleotides 
of the VH, DH, and JH gene segments are retained in the 
new rearrangement. The extra nucleotides from the 3 r end of 
the previous VH gene are sometimes referred to as a "foot- 
print." Nearly all human VH genes have between five and nine 
nucleotides in the potential footprint, between the cRSS and 
the RSS. Most primary RSS rearrangements delete several of 
these nucleotides from the 3 r end, so the potential footprint 
may not be easily recognizable. Moreover, during VHR, addi- 
tional nucleotides can be deleted, so the footprint from the pri- 
mary VH can be entirely lost during VHR. It is also possible 
for more than one replacement rearrangement to occur on the 
same heavy chain allele, a process referred to as "serial" or "suc- 
cessive" VHR (Figure IB) (20). An example of a heavy chain 
rearrangement with more than one footprint sequence is given 
in Figure 1C. 

DEMONSTRATION OF VH REPLACEMENT IN CELL LINES AND 
MOUSE MODELS 

VH replacement was initially discovered in two different trans- 
formed B cell lines (12, 22). In both of these early studies, B cells 
with non-functional heavy chain gene rearrangements (VDJ— ) 
were able to generate functional heavy chains ( VDJ+) by undergo- 
ing further heavy chain rearrangement into the cRSS. Continued 
VHR could also convert a functional VDJ+ rearrangement into 
a non-functional one through the incorporation of an upstream 
pseudo-VH gene (12). 

The development of antibody heavy chain (IgH) knock- in mice 
provided a formal demonstration of VHR in B cells in vivo. VHR 
was documented in hybridomas derived from the 3H9 heavy chain 
knock-in mouse (13). VHR and invasion of upstream DH gene 
also occurred in a knock-in for the T15 heavy chain (15). Fur- 
thermore, B cells from quasi-monoclonal mice, which have an 
anti-(4-hydroxy-3-nitrophenyl) acetyl (NP) heavy chain knock- 
in and can only produce X light chains, due to homozygous 
engineered k deficiency, can lose reactivity to NP by VHR. Strik- 
ingly, most secreted antibodies in the quasi-monoclonal mouse 
appear to arise through VHR (23). VHR was also observed in mice 
that were genetically engineered to contain two non-productively 
rearranged heavy chain alleles. In these VDJ— /VDJ— mice, IgHs 
were generated via VHR in a RAG-dependent manner (cross- 
ing the VDJ— /VDJ— mice onto a RAG2 deficient background 



resulted in a failure to generate IgM+ B cells) (24). The ability 
of RAG1 and RAG2 to bind to the cRSS was also demonstrated 
by electrophoretic mobility shift assays using VH4-34 cRSS versus 
consensus 12-RSS sequences (25). 

In all of the preceding mouse models, VHR conferred greater 
diversity or functionality upon the B cell repertoire (i.e., there was 
a selective pressure that favored VHR). In contrast, when VHR 
was compared with conventional rearrangement, using a mouse 
model with an out of frame VDJ rearrangement (VDJ—) that was 
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knocked into the heavy chain locus, conventional rearrangement 
on the other heavy chain allele occurred far more frequently (26). 
Similarly, in the 56R anti-dsDNA heavy chain knock-in mouse, 
receptor editing was far more efficient in B cells that were het- 
erozygous rather than homozygous for 56R (27). One caveat to 
the 56R study was that cells that had undergone VHR on one allele 
but were still left with a functional copy of the DNA- reactive 56R 
heavy chain on the other allele could be counter- selected. 

VH REPLACEMENT IN BONE MARROW B CELLS 

To gain further insight into the mechanism of VHR, studies were 
performed in mice to determine its developmental timing. Several 
studies suggest that VHR occurs at or near the time of conven- 
tional IgH gene rearrangement. The junctions of IgH sequences 
with evidence of VHR in IgH knock- in mice usually contain N- 
additions (13). Terminal deoxynucleotidyl transferase (TdT), the 
enzyme that carries out N-addition, is typically expressed at high- 
est levels during H chain rearrangement in pro-B and large cycling 
pre-B cells (28). Therefore, the presence of N-additions provides 
indirect evidence that VHR occurred at the time when TdT was 
active and therefore probably took place in pro-B or early pre-B 
cells. Further evidence in support of VHR in early stage B cells 
includes ligation-mediated PCR to measure DNA breaks at the 
heavy chain locus, which occurred at the highest levels in pro- 
B cells (29). These studies suggest that VHR is either occurring 
in cells where IgH rearrangement has not yet shut down (failed 
allelic exclusion) or is driven by pre-BCR rather than BCR signal- 
ing, since only the former receptor is expressed at the pre-B cell 
stage of development. 

With respect to pre-BCR signaling [reviewed in Ref. (30)], it is 
noteworthy that surrogate light chain knock-out mice have autore- 
active antibodies with long CDR3 sequences (31). One potential 
explanation for this result is that, in the absence of surrogate light 
chain, the pre-BCR does not assemble and turn off heavy chain 
rearrangement. Without a heavy chain rearrangement stop signal, 
there maybe higher frequencies of VHR, leading to CDR3 elonga- 
tion. However, an alternative possibility is that peripheral selection 
of B cells with long CDR3 sequences is relaxed in the lymphopenic 
setting that arises due to inefficient primary B cell production 
in surrogate light chain knock-out mice. It is known that in the 
absence of normal numbers of peripheral B cells, the level of the B 
cell survival factor BLyS (also known as BAFF) increases, since B 
cells are the primary consumers of BLyS. It is also known that the 
stringency of B cell selection can be reduced when BLyS levels are 
increased (32, 33). 

VH REPLACEMENT IN PERIPHERAL B CELLS 
Some studies suggest that VHR could occur in more mature B 
cell subsets. For example, there are data implicating BCR signaling 
in VHR in the EU12 human B cell line, which phenotypically 
resembles IgM+, CD 10+, CD24 hi § h cells. In these cells, BCR 
crosslinking promotes VHR and, conversely, Syk and Src kinase 
inhibitors inhibit VHR (34). While some of the kinase inhibition 
experiments could also be influencing mechanisms that operate at 
earlier stages of B cell development, the BCR crosslinking exper- 
iment suggests that BCR signaling could promote VHR in more 
mature B cells. Furthermore, ligation-mediated PCR experiments 



documented double-stranded DNA breaks at VH3 cRSS sites in 
human immature (IgM+, CD27— , CD 10+) and mature naive 
(IgM+, CD27— , CD 10—) circulating B cells, also suggesting that 
VHR may not be limited to immature B cells (34). 

Chronic graft versus host disease (GVHD) is one of the most 
intriguing examples in which VHR could be occurring in more 
mature B cells (35). B6 mice injected with I-A incompatible T cells 
from bml2 mice develop chronic GVHD and produce a spectrum 
of autoantibodies that resembles those found in systemic autoim- 
mune conditions such as systemic lupus erythematosus (SLE) (36). 
When anti-dsDNA heavy chain knock-in mice such as 3H9 and 
56R are used, GVHD occurs and the production of anti-nuclear 
antibodies is enhanced (35). But the remarkable finding is that 
among IgG antibodies, a large fraction does not use the knocked 
in heavy chain (35). Although this unexpected skewing away from 
the 56R H chain could be the result of selective pressures on 
the minority population of H chain edited B cells that emerge 
from the bone marrow, it is not at all obvious how this selec- 
tion could operate to disfavor the transgene, and why its effects 
would be largely confined to IgG and not IgM. It is possible that 
the transgene was revised (by further gene rearrangement) in the 
periphery, either because it was inactivated by somatic mutation 
(37), or because the stimulus afforded by cGVH re -induced the 
rearrangement machinery. An alternative explanation is that the 
56R transgene bearing cells are disfavored during primary B cell 
maturation because they recognize DNA and this self- reactivity 
causes them to be anergized (this would predict that 56R+ cells 
would be over- represented amongst IgM rather than IgG B cells). 
Consistent with the possibility of anergy, most B cells express- 
ing the IgM allotype of the 56R transgene have low levels of 
IgM (38-40). 

WHAT ARE THE CONSEQUENCES AND POTENTIAL 
FUNCTIONS OF VH REPLACEMENT? 

VH replacement allows a B cell with an inadequate pre-BCR or 
an autoreactive BCR to swap out the existing heavy chain and 
replace it with a different heavy chain. But why would this be 
useful? One possibility is that VHR increases the odds of gener- 
ating a functional antibody. Producing a functional antibody is 
rather difficult (41): many rearrangements are out of frame, VH 
pseudogenes outnumber functional VH genes, many newly gener- 
ated antibodies are autoreactive (42), some combinations maybe 
sequestered inside the cells (38) and some H and L chain combi- 
nations may not pair well with each other. VHR may also facilitate 
the use of a wider array of upstream VH genes. By giving cells 
with defective antibody rearrangements a chance at revising those 
antibodies, perhaps the efficiency of primary B cell generation is 
greatly improved. 

On the other hand, a seemingly diametrically opposed con- 
sequence of VHR is the potential generation of multireactive 
antibodies. VHR can sometimes result in the retention of a "foot- 
print" that is comprised of DNA sequences downstream of the 
cryptic heptamer of the invaded VH gene (Figure 1). Because 
the cRSS is typically positioned further from the WGXG motif 
in the JH segment than the 5 r RSS of a DH segment, VHR is 
likely to produce longer CDR3 segments than primary rearrange- 
ments. Not surprisingly, longer CDR3 sequences have a higher 
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FIGURE 2 | (A) Longer CDR3 sequences have more footprints. Plotted are 
the average numbers of footprint 5-mers per sequence. Sequences are 
averaged at each CDR3 size by the number of sequences that have CDR3s of 
that particular length. Blue dots are in-frame (IF) rearrangements and red dots 
are out of frame (OF) rearrangements. We find a positive linear correlation 
between the length of the CDR3 and the number of footprints {r 2 = 0.89 for 



IF, r 2 = 0.7 for OF). The line describing this relationship has a slope of 
0.06 ± 0.008 for OF (red line) and 0.05 ± 0.008 for IF rearrangements (blue 
line). (B,C) are two examples of the entire distribution of footprint numbers at 
two positions - position 56 (red circle OF) and position 69 (blue circle IF). 
Black stems indicate the numbers of footprints and red lines represent the fit 
of a Poisson distribution (X = 1.57 and 2.74, respectively). 



proportion of footprints (Figure 2), but this does not guarantee 
that all long CDR3 are the product of VHR. Seventy-eight per- 
cent of the potential footprint regions in functional human 
VH genes contain an arginine codon, so footprint- containing 
sequences often also harbor a larger number of charged residues. 
Longer CDR3s have been associated with greater multireactiv- 
ity, and such multireactive B cells are normally counter- selected 
as B cells mature during normal B cell development (42). RA 
patients have antibodies with unusual CDR3 sequences in their 
synovium (17) and we have seen CDR3 sequences in patients 
with SLE that have regions of sequence homology that could 
arise due to VHR. For example, Figure 1C shows a rearrange- 
ment from an expanded B cell clone in a patient with SLE that 
appears to contain two footprint sequences (highlighted in yel- 
low). Autoimmune-prone strains of mice have elongated CDR3s, 
although many of these may arise through mechanisms other 
than VHR, such as D-D fusion (43, 44). All of these findings 
beg the question of whether such "multireactivity" serves a useful 
function. Is multireactivity protective, particularly in the context 
of an innate immune response? Or could multireactive anti- 
bodies be useful in clearing debris that might be inflammatory 
if left to accumulate? It is intriguing in this regard that some 
multireactive IgM antibodies such as the famous T15 idiotype, 
which binds phosphorylcholine (45), also have an ti- inflammatory 
properties (46). 

It is possible that there is no simple single answer to the func- 
tion of VHR, if it has one at all. It would certainly seem that 
the biological consequences of VHR depend upon the develop- 
mental context in which the rearrangement occurs. If replace- 
ment occurs centrally, as is likely to occur in wild type strains 
of mice such as B6 (40, 47), it could serve as a tolerance 



mechanism (receptor editing) or as means of increasing the 
efficiency of primary B cell generation. It might also gener- 
ate a portion of the primary antibody repertoire that has spe- 
cial functional properties such as multireactivity. Conversely, if 
it occurs peripherally, as might arise in dysregulated states of 
immune activation such as GVHD (48), perhaps autoimmunity 
results. 

VH REPLACEMENT IN pre-B CELL ALL 

Given the abundance of findings linking VHR to pro- or pre-B cell 
development discussed above, it is not surprising that the initial 
demonstrations of VHR occurred in transformed pre-B cell lines. 
More recently, VHR has been demonstrated to be a major contrib- 
utor to clonal evolution in precursor B cell acute lymphoblastic 
leukemia (B-ALL) (49, 50). In B-ALL, there is presumably a large 
clone of cells "frozen" in the pre-B cell stage. The recombinase 
machinery remains active in at least some of these cells and can 
drive VHR. It is instructive to review the early work in the murine 
pre-B cell line NFS5, in which VHR was found to alter not only 
the productive but also the non-productively rearranged allele 
(12). Thus assays where one attempts to define a clone based 
upon its predicted "conservation" of other immunoglobulin gene 
rearrangements (such as the other H chain allele) within the same 
cell are not necessarily reliable or easy to interpret. The poten- 
tial for VHR to contribute to intraclonal diversification is highly 
relevant to the design and interpretation of assays for minimal 
residual disease monitoring that employ quantitative PCR with 
probes or primers for clone-specific junctional sequences (51) or, 
more recently, high throughput sequencing of heavy chain CDR3 
(52). Such studies must take VHR and other forms of intraclonal 
diversification into account. 
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Table 1 | Footprint sequences in the 3' end of human germlineVH genes and alleles. 



Footprint (5-mer variants) 

CGAGAGA (CGAGA, GAG AG, AG AG A) 



CGAGAGG (CGAGA, AGAGG) 
C GAG AC A (CGAGA, GAG AC, AGACA) 
C GAG ATA (CGAGA, GAGAT, AGATA) 
C GAG AAA (CGAGA, GAGAA, AGAAA) 
CAAGAA/A (CAAGA, AAGAN, AGANA) 



CAACAGA 

CTAGAGA (CTAGA, TAGAG, AGAGA) 

CTAGGGA (CTAGG,TAGGG, AGGGA) 
CGAAAGA (CGAAA, GAAAG, AAAGA) 



CCAGATATA (CCAGA, CAGAT, AGATA, 
GATAT, ATATA) 

CCAGAGA (CCAGA, CAGAG, AGAGA) 
TGAAACA (TGAAA, GAAAC, AAACA) 

TGAGA 

TGAGAGA (TGAGA, GAG AG, AGAGA) 

TGAGAAA (TGAGA, GAGAA, AGAAA) 

TG AAAGA (TGAAA, GAAAG, AAAGA) 

CGGCAGA (CGGCA, GGCAG, GCAGA) 

CACGGATAC (CACGG, ACGGA, CGGAT, 
GGATA, GATAC) 



VH gene allele(s) 

CGAGAGA VH1-18, VH1-2*1 , VH1-2*2, VH1-2*3, VH1-2*5, VH1-3, VH1-46*1 , VH1-46*2, VH1-69*1 , 

VH1-69M, VH1-69*6, VH1-69*8, VH1-69*9, VH1-69*10, VH1-69*11 , VH1-69*12, VH1-69*13, 
VH1/0R15-1 *2, VH1/0R15-1 *3, VH1/0R15-1 *4, VH3-11 *1, VH3-11 *4, VH3-11 *5, VH3-21 , 
VH3-30*1 , VH3-30*3, VH3-30M, VH3-30*5, VH3-30*6, VH3-30*7, VH3-30*9, VH3-30*10, 
VH3-30*11 , VH3-30*12, VH3-30*13, VH3-30*14, VH3-30*15, VH3-30*16, VH3-30*17, 
VH3-30*18, VH3-30*19, VH3-33*1 , VH3-33*2, VH3-33*4, VH3-33*5, VH3-48, VH3-53*1 , 
VH3-53M, VH3-64*1 , VH3-64*2, VH3-64*4, VH3-66*1 , VH3-66*3, VH3-7*1 , VH3-7*3, 
VH4-28*3, VH4-30-2M, VH4-31 *1 , VH4-31 *2, VH4-31 *3, VH4-31 *10, VH4-34*9, VH4-39*2, 
VH4-39*6, VH4-39*7, VH4-4*2, VH4-4*6, VH4-4*7 VH4-59*1 , VH4-59*2, VH4-61 *1, 
VH4-61 *2, VH4-61 *3, VH4-61 *8, VH4/OR15-8, VH7-4-1 *2, VH7-4-1 *4, VH7-4-1 *5 

CGAGA VH1-2M, VH1-69*2, VH1-69*5, VH1/0R15-1 *1 , VH3-11 *3, VH3-30*8, VH3-30-3*1 , 

VH3-53*2, VH3-66*2, VH3-7*2, VH4-28*4, VH4-34*12, VH4-59*7, VH4-61 *5, VH4-b, 
VH5-51 *3, VH5-51 *4, VH5-a, VH7-4-1 *1 

VH1-8, VH4-34*1 , VH4-34*2, VH4-34M, VH4-34*5, VH4-34*13, VH4-59*9 

VH3-66*4, VH4-30-2*3, VH4-39*1 , VH4-59*8, VH4-61 *7 VH5-51 *1 , VH5-51 *2 

VH4-34*10, VH4-59*10, VH7-81 

VH4-28*1 , VH4-28*2, VH4-28*5, VH4-28*6 



CAAGANA 
CAAGATA 
CAAGAGA 
CAAGA 

CAACAGA 
CAACA 

CTAGAGA 
CTAGA 

VH3-53*3 

CGAAAGA 
CGAAA 
CGNNN 
CG 

VH3-38 



VH 1-45*1 
VH1-45*2 

VH3-13*1 , VH3-13*2, VH3-13*4, VH3-74*1 , VH3-74*3, VH3/OR16-10*3, VH6-1 
VH1-45*3, VH3-13*3, VH3-74*2, VH3/OR16-10*1 , VH3/OR16-10*2, VH3/OR16-12 

VH1-24 
VH1-f*1 

VH 1-46*3, VH3-72*1 , VH3/0R1 5-7*5 
VH3/0R1 5-7* 1, VH3/0R1 5-7*2, VH3/0R1 5-7*3 



VH3-23*1 , VH3-23*2, VH3-23*4, VH3-30*2, VH3-30-3*2, VH3-33*3, VH3-33*6, VH3-NL1 
VH3-23*3, VH3-23*5 

VH4-30-2*2, VH4-31*4, \ZH4-34*8, VH4-39*5, VH4-59*3, VH4-59*4, VH4-59*5, VH4-59*6, 
VH4-31*5 



VH4-30-2*1 , VH4-30-2*5, VH4-30-4*1 , VH4-30-4*2, VH4-30-4*5, VH4-30-4*6, VH4-61 *6 



TGAAA 
TGAAACA 



TGAGA 
TGAGAGA 



VH3/0 R 1 6-8* 1 , VH3/0 R 1 6-9 
VH3/0R1 6-8*2 



VH1/OR15-5 

VH1/OR15-9, VH1/0R21-1 



VH3-16, VH3-35 
VH3-64*3, VH3-64*5 
VH1-58 

VH2-26, VH2-70*1 , VH2-70*10, VH2-70*11 



(Continued) 
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Table 1 | Continued 



Footprint (5-mer variants) 

CATGGAGAG (CATGG, ATGGA, TGGAG, 
GGAGA, GAGAG) 

TACGG 

TANNN 

CACGG 

CACACAGACC (CACAC, ACACA, 
CACAG, ACAGA, CAGAC, AGACC) 

CAAAAGATA (CAAAA, AAAAG, AAAGA, 
AAGAT, AGATA) 



VH gene allele(s) 

VH2/OR16-5 

VH2-5M, VH2-70*9 

VH2-5*7 

VH2-5*10 

CACACAGACC 
CACACAGAC 
CACACAGA 

VH3-43, VH3-9 



VH2-5*1 

VH2-5*5, VH2-5*8, VH2-5*9, VH2-70*12 
VH2-5*6 



Two hundred and seventy-three functional VH genes, including alleles and sequences designated as open reading frames, were downloaded from the I MGT database 
(54) and manually scanned for footprints. A footprint is defined by the nucleotide sequence following the cryptic recombination signal sequence (cRSS), TACTGTG, 
at the 3 end of each VH, and is listed in the left column of the table. The footprint 5-mer variants that were used to scan the sequences are included in parentheses. 
The right column lists the VH genes and alleles. An asterisk in the VH name refers to a specific allele. If all alleles of a particular VH have the same footprint, allele 
names are omitted. Overlapping footprints are listed for some footprints in the sub-column on the right. Footprints in red font are also found in germline DH and 
JH gene segments. Sequences with ambiguous nucleotide designations (N-nucleotides) are indicated by italic font. The following VH gene alleles do not have a 
cRSS: VH1-18*2, VHl-69*3, VHl-69*7, VHl-c, VHl-f*2, VH2-5*2, VH2-5*3, VH2-70*2, VH2-70*3, VH2-70*4, VH2-70*5, VH2-70*6, VH2-70*7, VH2-70*8, VH2-70*13, 
VH3-15, VH3-20, VH3-25*4, VH3-49, VH3-72*2, VH3-73*1, VH3-73*2, VH3-d, VH3/OR16-13, VH3/0R16-6*2, VH4-30-4*3, VH4-30-4*4, VH4-31 *6, VH4-31 *7, VH4-31 *8, 
VH4-31*9, VH4-34*3, VH4-34*6, VH4-34*7, VH4-34*11, VH4-39*3, VH4-39*4, VH4-4*1, VH4-4*3, VH4-4*4, VH4-4*5, VH4-61*4, VH5-51*5, VH7-4-1*3. 



ANALYSIS OF VH REPLACEMENT FOOTPRINTS 

The most convincing demonstrations of VHR are those in which a 
precursor-product relationship can be documented. For example, 
if the precursor VH gene is known and then additional B cells 
can be found to share most of the 3 r side of the CDR3 (the 
same DH-JH junction), but have a different VH gene, this can 
be compelling, as in B-ALL or in mouse models with heavy 
chain knock-ins. In contrast, the analysis of VHR in a physi- 
ologic and fully diversified immune repertoire has by necessity 
focused on indirect evidence, namely the enumeration of foot- 
prints, which are potential traces of previous VDJ rearrangements 
in IgH sequence data. In mice, footprints are readily observed 
in constrained immune repertoires [for example, Ref. (13, 15, 
23)]. Footprints are also observed in humans (53). However, a 
fundamental issue with footprint analysis in humans is one of 
specificity of attribution: does the footprint arise due to VHR 
or is it due to some other form of junctional diversification 
or skewing in the rearrangement process? Or does it occur by 
chance? 

To investigate the hypothesis that footprint sequences are due 
to the process of VHR, we sequenced IgH rearrangements from 
peripheral blood B cells of a healthy human adult subject, fol- 
lowing an IRB-approved protocol. We identified 42,221 unique 
sequences from this sample, which we analyzed for VH footprints 
using a sliding window method (see Supplementary Material 
for further details). All of the potential footprints arising from 
sequences at the 3 r ends of the germline VH genes are listed 
in Table 1. In accordance with their conventional description 
in the literature (41), we required the footprint to be least five 
nucleotides long (we hereafter refer to these sequences as foot- 
print 5-mers). If footprint 5-mers are due to VHR, they will have 



specific characteristics in antibody repertoire data, indicated by 
the tests described below. 

TEST 1: VH REPLACEMENT FOOTPRINTS SHOULD BE LOCATED IN THE 5' 
END OF THE CDR3 SEQUENCE 

One way to distinguish bona fide footprints from other sources of 
sequence variation is to compare the number of footprints in the 
junction between VH and DH (referred to as Nl) to the number 
in the junction between DH and JH (referred to as N2). Footprints 
arising via VHR should occur in Nl rather than N2 because the 
cryptic heptamer is located in Nl . However, as shown in Figure 3 A, 
there is a roughly bimodal distribution of footprint 5-mers. Even 
though we excluded the most common footprints that were found 
in the germline DH gene segments (Table 2), there were still plenty 
of footprint sequences in Nl, DH, and N2. In Figure 3B we took 
the analysis one step further and removed some more of the com- 
mon footprint sequences that are found not only in the germline 
DH gene segments but also in JH6. This resulted in more skewing 
toward Nl, but a large proportion of the footprints were still out- 
side of Nl. In fact, not only were footprint 5-mers found in DH 
and JH, but they were also found in other parts of the VH gene. 
Table 3 lists the positions of all of the footprint 5-mers found 
amongst the germline VH alleles listed in the IMGT database. 

TEST 2: VH REPLACEMENT FOOTPRINTS SHOULD BE MORE FREQUENT 
IN UPSTREAM VH GENES THAN DOWNSTREAM VH GENES AND 
ABSENT FROM VH6-1 

Another requirement for a footprint to be consistent with VHR is 
that the invading VH must be upstream of the VH that donated 
the footprint. Unfortunately, the recipient VH is often difficult to 
define because many VHs have the same or very similar footprints 
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Normalized CDR3 length 




Normalized CDR3 length 



FIGURE 3 | Positions of footprint 5-mers. (A) Frequency distribution of all 
footprint 5-mers out of total unique sequences [n = 42,221) plotted against 
the normalized CDR3 position. The CDR3 is herein defined to begin at the 
conserved CAR amino acid sequence (TGT GCG AGA nucleotide sequence) 
within the 3' end of VH and end at the conserved W (TGG nucleotide 
sequence) that is immediately upstream of the first conserved glycine, GGC 
nucleotide sequence) within the JH. "TGGAG" is excluded as it is found in 
many alleles of all DH genes. The position of the footprint is defined by where 
the footprint starts within the CDR3. For example, if a footprint occupies 
nucleotides 12-16 of the CDR3, it will be plotted at position 12. CDR3 lengths 



were normalized to a scale of 1-100 using p1 = p/(L x 100); where p is the 
position of the 5-mer in the real CDR3 sequence, L is the length of the CDR3 
sequence, and p1 is the normalized position. Normalized positions are 
rounded to the nearest integer. (B) Frequency distribution of all footprint 
5-mers plotted against the normalized CDR3 position, corrected for footprint 
5-mers in the germline JH6 gene. "ATGGA," "TACGG," and "CATGG" were 
excluded when found in the 3' end of the CDR3 within the JH6 gene, as 
these 5-mers are found in the germline JH6 sequence. Footprint 5-mers 
found in other JHs (seeTable 2) are not counted in either (A) or (B) because 
they are all located outside CDR3 region of JH. 



(see Table 1). However, a more straightforward test of whether 
a footprint 5-mer represents the product of VHR is to evalu- 
ate the frequency of footprints in different VH rearrangements. 
In particular, the 3 7 most VH gene (VH6-1, in humans), when 
rearranged, should not exhibit VHR footprints as there is no 
downstream VH that it can invade. Conversely, VH genes that 
are situated in the b' end of the locus should have higher fre- 
quencies of footprints than 3' VH genes, if VHR is frequent. Yet 
the overall frequency of footprint 5-mers was similar amongst 
unique sequences in all of the most commonly used VHs, includ- 
ing VH6-1 (Figure 4). The frequency of footprints was also not 
significantly higher in out of frame (unselected) versus in-frame 
rearrangements (Figure 4A). 

We also performed this analysis using immunoglobulin analy- 
sis tool (IgAT) software (42) and observed that the frequency of 



footprints was not reduced in VH6-1 when compared to other 
VHs (Figure 4B). Lower VH footprint frequencies were observed 
overall because footprints in the 3 r end of the CDR3 are excluded 
by the IgAT program (42). One intriguing feature of the IgAT data 
was that, unlike our footprint analysis that captured 5-mers at 
both Nl and N2, when only Nl was analyzed, some VHs, includ- 
ing VH6-1, had higher footprint frequencies than others. Since 
VH6-1 cannot have any footprints due to VHR, we conclude that 
many footprint 5-mers that are found in the CDR3 do not arise 
by VHR. 

The simplest explanation is that the great majority of 5-mer 
sequences found throughout the CDR3 resemble footprints by 
chance. The frequency of footprint 5-mers in the entire CDR3 
was highly correlated with the length of the CDR3 (Figure 2). 
The ability to generate a replacement footprint by chance may be 



www.frontiersin.org 



January 2014 | Volume 5 | Article 10 | 7 



Meng et al. 



Trials and tribulations with VH replacement 



Table 2 | Footprint sequences in DH and JH alleles. 



DH gene 


Sequence (footprint(s) in red font) 


D1-1*01 


G GTAC AACTG G AACG AC 


D1-14*01 


G G TATAAC C G G AAC C AC 


D1-20*01 


G GTATAACTG G AACG AC 


D1-26*01 


G GTATAGTG G GAG CTACTAC 


D1-7*01 


G GTATAACTG GAACTAC 


D2-15*01 


AG G ATATTGTAGTG GTG GTAG CTG CTACTCC 


D2-2*01 


AG G ATATT G TAG TAG TAC C AG CTG CTATGCC 


D2-2*02 


AG G ATATT G TAG TAG TAC C AG CTG CTATACC 


D2-2*03 


TG G ATATTGTAGTAGTACC AG CTG CTATGCC 


D2-21*01 


AG CATATTGTG GTG GTG ATTG CTATTCC 


D2-21*02 


AG CATATTGTG GTG GTG ACTG CTATTCC 


D2-8*01 


AG G ATATTGTACTAATG GTGTATG CTATACC 


D2-8*02 


AG GATATTGTACTG GTG GTGTATG CTATACC 


D3-10*01 


GTATTACTATG GTTCG G G G AGTTATTATAAC 


D3-10*02 


G TATTACTATG TTC G G G G AGTTATTATAAC 


D3-16*01 


GTATTATG ATTACGTTTG G G G G AGTTATG CTTATACC 


D3-16*02 


GTATTATG ATTACGTTTG G G G GAG TTATC G TTATAC C 


D3-22*01 


GTATTACTATG ATAGTAGTG GTTATTACTAC 


D3-3*01 


GTATTACGAI I I 1 1 G G AGTG G TTATTATAC C 


D3-3*02 


GTATTAGCAI I I 1 1 GG AGTG G TTATTATAC C 


D3-9*01 


GTATTACG ATATTTTG ACTG GTTATTATAAC 


D4-11*01 


TG ACTAC AG TAACTAC 


D4-17*01 


TG ACTAC G GTG ACTAC 


D4-23*01 


TG ACTACG GTG GTAACTCC 


D4-4*01 


TG ACTAC AG TAACTAC 


D5-12*01 


GTG G ATATAGTG G CTACG ATTAC 


D5-18*01 


GTG GATACAG CTATG GTTAC 


D5-24*01 


GTAG AG ATG G CTACAATTAC 


D5-5*01 


GTG GATACAG CTATG GTTAC 


D6-13*01 


G G G TATAG C AG C AG CTG G TAC 


D6-19*01 


G G G TATAG C AG TG G CTG G TAC 


D6-25*01 


G GG TATAG CAG CG G CTAC 


D6-6*01 


GAG TATAG CAG CTCGTCC 


D7-27*01 


CTAACTGGGGA 


JH gene 


Sequence (footprint(s) in red font) 


J1*01 


G CTG AATACTTC CAG C ACTG G G G C CAG GGCACCCTGGTC 




ACCGTCTCCTCAG 


J2*01 


CTACTG GTACTTCG ATCTCTG GGGCCGTGG C ACCCTG GTC 




ACTGTCTCCTCAG 


J3*01 


TG ATG CTTTTG ATGTCTG G G G C C AAG G G AC AATG GTC AC C G 




TCTCTTCAG 


J3*02 


TG ATG CTTTTG ATATCTG G G G CC AAG G G AC AATG GTC ACC G 




TCTCTTCAG 


J4*01 


ACTACTTTG ACTACTG G G G C C AAG G AAC CCTG GTC ACC GTCT 




CCTCAG 


J4*02 


ACTACTTTG ACTACTG G G G CCAG G G AAC CCTG GTC ACC GTCT 




CCTCAG 


J4*03 


G CTACTTTG ACTACTG G G G CCAAG G G ACCCTG GTCACCGTCT 




CCTCAG 



(Continued) 



JH gene Sequence (footprint(s) in red font) 



JO U I 


AP A APTP PTTPP APTPPTP P P P PP A AP P A APPPTP PTP APP 




PTPTPPTPAP 


JO uz 


APA APTPPTTPPAPPPPTPPPPPPAPPPA 




ACCCTG GTCACCGTCTCCTCAG 


J6*01 


ATTACTACTACTACTACG GTATG G ACGTCTG G G G G CAAG G G A 




CCACGGTCACCGTCTCCTCAG 


J6*02 


ATTACTACTACTACTACG GTATG G ACGTCTG G G G CCAAG G G A 




CCACGGTCACCGTCTCCTCAN 


J6*03 


ATTACTACTACTACTACTACATG G ACGTCTG G G G CAAAG G G A 




CCACGGTCACCGTCTCCTCAN 


J6*04 


ATTACTACTACTACTACG GTATG G ACGTCTG G G G CAAAG G G A 




CCACGGTCACCGTCTCCTCAG 



Thirty-four germline functional human DH alleles and 13 JH alleles were down- 
loaded from the IMGT database (54). Each allele was scanned for all of the 
possible five nucleotide footprint motifs listed in Table 1. Sequences containing 
five nucleotide footprints are given in red font. In some cases the region matches 
more than one possible footprint (for example, D5-12*01 contains three different 
footprints: GGATA, GATAT, and ATATA). 

under-appreciated. In a completely random DNA sequence with 
equal proportions of A, T, G, and C bases, the chance of finding a 
specific 5-mer sequence is 1/1,024 (or -0.001). However, there are 
at least 50 different footprint- derived 5-mer sequences amongst 
human VH genes (Table 3), increasing the odds to 50/1,024 
(~5%). But this calculation ignores the number of different posi- 
tions along the VDJ rearrangement where the footprint might be 
detected and on how many variants of the footprint are permit- 
ted. If the 5 r end of a CDR3 sequence is 30 nucleotides long, that 
means that there are 6 completely non-overlapping sequences that 
have a length of five nucleotides, bringing the minimum likelihood 
of detection of at least a single footprint in that sequence up to 
26% [1 — (1— 0.05) 6 ] or 1 - Pr(not getting any 5-mers in the 30 bp 
sequence). If the base composition of the DNA is non-uniform or 
the entire CDR3 sequence is surveyed or if sequences with muta- 
tions are permitted (for example those matching in 4 out of 5 bp), 
the chances of detecting a footprint increase even further. 

We also wondered why some VH genes had higher footprint 
frequencies in Nl than others (Figure 4B), as this finding is not 
similar to what one would expect by random chance. We won- 
dered if the real VHR events were hiding somewhere in a large pile 
of non-VHR footprints. A high "false positive" rate of footprint 
5-mers could come about because of sequencing errors. Alterna- 
tively or in addition, it may be easy to create false VHR 5-mer 
sequences in primary VDJ rearrangements through a combina- 
tion of N-addition, nibbling (or sequencing deletion) and the 3' 
sequence of the VH. For example AAAGA could become AAGA 
or AGA. 

It may be worthwhile to develop a better computational 
approach for detecting VHR footprints with greater specificity for 
VHR. The IgAT software already eliminated footprints that match 
the germline VH sequence exactly, but this is insufficient, give the 
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Table 3 |The number of footprints found in various regions of human VH genes. 



Footprint 


Sequences 


FR 


CDR 


FR1 


FR2 


FR3 


CDR1 


CDR2 


CDR3 


TGAGA 


131 


212 


12 


94 




118 


3 


3 


6 


CTAGAGA 


8 




8 












8 


CGAAAGA 


9 




9 












9 


CTAGA 


28 


17 


16 




17 




1 


1 


14 


TACGG 


9 


5 


4 


1 




4 


1 


1 


2 


CAAAA 


45 


41 


13 






41 




9 


4 


CTAGGGA 


2 


1 


1 




1 








1 


CAACA 


55 


33 


27 


1 


5 


27 


2 


23 


2 


CAAGA 


150 


149 


18 




4 


145 




3 


15 


CACACAGA 


20 


20 


6 


20 










6 


C GAG AC A 


7 




7 












7 


CGAGAAA 


3 




3 












3 


TGAAAGA 


2 




2 












2 


TGAAACA 


3 




3 










2 


1 


CGAGAGG 


8 


2 


6 




2 








6 


C GAG AG A 


86 




86 












86 


CACGG 


179 


178 


6 






178 






6 


C GAG ATA 


3 




3 












3 


CGAAA 


11 




11 










1 


10 


TGAAA 


61 


79 


7 


33 




46 


1 


3 


3 


CACACAGAC 


20 


20 


5 


20 










5 


CGAGA 


130 


3 


127 


1 


2 








127 


CACACAGACC 


20 


20 


1 


20 










1 


TGAGAAA 


6 




6 










3 


3 


CAAAAGATA 


3 




3 












3 


CGGCAGA 


2 




2 












2 


TGAGAGA 


3 


1 


2 






1 






2 


CCAGATATA 


2 




2 












2 


CAAGATA 


1 




1 












1 


CCAGAGA 


84 


80 


4 






80 






4 


CACGGATAC 


5 




5 












5 


CAAGAGA 


27 


19 


8 






19 






8 



Two hundred and thirty-four functional germline human VH alleles were downloaded from the I MGT database (54). For each footprint, the number of times that 
footprint was found in the VH alleles was recorded. The first column lists the footprint, the second column lists the number of alleles in which the footprint is found. 
Some alleles contain more than one footprint. The remaining columns indicate how many footprints were found in each of the corresponding regions of the V region. 
FR, framework; CDR, complementarity determining region. The columns named FR and CDR provide the total number of times that a particular footprint is found in 
the FR or the CDR. 

high frequency of footprints in Nl of VH6-1. Further specificity We considered two potential alternative mechanisms - (1) 
might be achievable if one were to limit the detection of footprint microhomology-mediated joining and (2) cleavage, nibbling, and 
5-mers to sequences that are unlikely to arise through a single rejoining at the cryptic heptamer. 
nucleotide change (for example, those that arise by deletion that 

converts a non-matching 6-mer to a matching 5-mer, or muta- ALTERNATIVE THEORY 1: MICROHOMOLOGY-MEDIATED JOINING 

tion of a non-matching 5-mer to a footprint 5-mer or N-addition We considered the possibility that footprints at Nl were aris- 
of a nucleotide adjoining a 4-mer to create a footprint 5-mer). ing primarily due to microhomology-mediated joining of sim- 
An alternative approach is to require that there be two footprint- ilar sequences between the VH and the DH segments. If 
like sequences in tandem. Either or both of these methods might microhomology-mediated joining were common, one might 
increase specificity, but could also reduce sensitivity of footprint expect that VHs that share the same 5-mers with DHs are more 
detection. The validity of either approach would need to be tested likely to rearrange, but as shown in Figure 5, this is probably 
further using validated data sets in which VHR events are known not usually the case. DH5-12 (open bars in Figure 5), which 
to have or have not occurred. has three footprint 5-mers, does not appear to be used more 

We also considered the possibility that footprint 5-mers may frequently in rearrangements involving VHs that contain the 
frequently arise through some mechanism other than VHR. same 5-mers such as VH2-26 (red arrow). Rather than being 
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IFVH □ OFVH □ total VH 




FIGURE 4 | (A) Percentage of rearrangements with footprints for the 10 
most frequent VH rearrangements and VH6-1.The percentages of 
rearrangements that contain at least one footprint 5-mer in either (or both) 
the 5' end or the 3' end of the rearrangement are shown for unique 
rearrangements for each of the 10 most common VH genes and for VH6-1. 
CDR3 sequences are defined as described in the legend to Figure 3 and 
normalized to an arbitrary length of 100. The 5' end (which almost always 
contains N1) is defined as the first 20% of the sequence and the 3' end is 
defined as the last 20% of the sequence (which almost always contains 
N2). VH6-1 is the most 3' VH gene and cannot contain footprints that are 
due to VH replacement. Black bars denote unique rearrangements that are 
in-frame (IFVH), white bars denote out of frame rearrangements (OF VH), 
and gray bars indicate total unique rearrangements (total VH). (B) Footprint 
Frequency of in-Frame VH rearrangements in N1 using IgAT Software. The 
same unique in-frame (IF) VH rearrangements, as shown in (A), were 
analyzed for the presence of one or more footprints using IgAT software (2). 
Plotted are the frequencies of unique IFVH rearrangements that have one 
or more footprints in the N1 region. 



skewed toward particular VHs with matching or similar footprints, 
the frequency of rearrangements of different VHs to DH5-12 
rearrangements resembled overall VH usage (Figure 5, closed 
bars). While this analysis is very preliminary and only focused 
on a single DH, it suggests that microhomology-mediated join- 
ing, based upon shared sequences between VH and DH, is not a 
frequent mechanism for generating footprint 5-mers. 

ALTERNATIVE THEORY 2: CLEAVAGE, NIBBLING, AND REJOINING AT 
THE CRYPTIC HEPTAMER IN VH 

We wondered if there could be cleavage at the cryptic heptamer, 
followed by exonucleolytic nibbling and re-sealing at the site of 



the break, without full-blown rearrangement (Figure SI in Supple- 
mentary Material illustrates this idea for a VH6-1 rearrangement). 
Note that this type of rearrangement product would not involve 
VHR, but would have the result of diversifying the 3 r end of the 
VH in the primary rearrangement product, altering the primary 
amino acid sequence and/or the reading frame of the rearrange- 
ment. This hypothesis makes predictions regarding the sequence 
characteristics that would be more or less amenable to this type of 
atypical open-and-shut joint (2). For example, one would expect 
that if most footprint 5-mers at Nl arise by this mechanism, that 
the frequency of footprint 5-mers would be very low in VH genes 
that lack cryptic heptamers. Furthermore, one would expect that 
the 5 r footprint 5-mer seen in most rearrangements would resem- 
ble the 3 r end of the germline sequence of the same VH gene that 
is present in the rearrangement. 

PRELIMINARY CONCLUSIONS AND CAVEATS FROM FOOTPRINT 
ANALYSIS 

Taken together, these data suggest that many if not most foot- 
print sequences arise by some mechanism(s) other than VHR. But 
there are some caveats to this analysis. First, these data were only 
obtained on one healthy adult. It is possible that footprints may dif- 
fer in other individuals or in a minority of individuals. In addition, 
different findings might occur in individuals with immunologic 
disorders such as SLE or neoplastic conditions such as B-ALL. 
Furthermore, only B cells from the peripheral blood were ana- 
lyzed. It is conceivable that B cell populations with extensive VHR 
reside elsewhere in the body, particularly within the bone marrow. 
Finally, as discussed above, it is possible that some of the VHR 
footprints that were identified are due to sequencing errors. We 
tried to protect against this artifact by selectively analyzing unique 
sequences that were present in at least two copies. But even with 
this precaution, there are still likely to be many sequencing errors. 

CONCLUSION 

VH replacement exchanges the VH within a pre-existing VDJ 
rearrangement with an upstream VH gene, while preserving most 
of the original CDR3 sequence. It also sometimes results in the 
retention of a footprint sequence in the VH gene that was invaded. 
The result of VHR is an alteration in the specificity or functional 
status of the antibody. But the mechanistic consequence of that 
alteration is unclear. Is it to diversify the repertoire once a good 
CDR3 sequence has been found? Or is it to reduce autoreactivity 
or generate some form of protective multireactivity? Or is it sim- 
ply a means by which B cells with non-productive rearrangements 
on one or both alleles have another shot at creating a productive 
rearrangement? In humans, the analysis of VHR is confounded 
by not having a means of definitively identifying the precursor 
rearrangement. Rather, the analysis of VHR in humans is accom- 
plished indirectly through footprint analysis, but as demonstrated 
herein, footprints may arise for reasons other than VHR. Thus, 
while VHR certainly occurs, footprint analysis is a poor measure 
of its frequency because of the high rate of false positives and 
an unknown rate of false negatives. Nevertheless, it is possible 
that footprints may provide other insights into the mechanisms 
of V(D)J recombination and its potentially aberrant regulation in 
disease states. With the advent of high throughput sequencing 
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FIGURE 5 | VH usage is similar in all unique rearrangements and in rear- 
rangements with DH5-12. VH usage is shown for total unique sequences 
(closed bars, n = 42,221) and for those with DH5-12 (open bars, n = 1,029). 
Sequences containing the DH5-12 gene segment in the CDR3 region were 



recognized using IMGT high V quest analysis, and further analysis on VH 
usage was performed in-house (see text). The red arrow points toVH2-26, 
which contains some of the same footprint 5-mers as DH5-12, but does not 
appear to be used more frequently in rearrangements that include DH5-12. 



studies, further analysis of IgH gene rearrangements for VHR 
and other mechanisms of CDR3 diversification promise to be 
illuminating. 
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abstract 

Figure S1 | Model for atypical open-and-shut joints. Footprint 5-mers can be 
generated by cleaving at the cryptic heptamer, followed by preferential trimming 
by exonucleolytic nibbling at the 3' end of the double strand break. Shown is the 
generation of a footprint sequence in a VH6-1 rearrangement. The cryptic 
heptamer is indicated by a dashed triangle, colored squares indicate the VH, 
DH, and JH gene segments (not drawn to scale). According to this model, there 
is cleavage at the cryptic heptamer, followed by nibbling at the 3' end of the 
break (red wavy line), leading to selective loss of the A residue. The third line of 
the figure shows the repaired rearrangement, in which the A residue is missing 
from the rearrangement. 
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